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ABSTRACT 


This  paper  considers  an  infinite  stage  linear  decision 
problem  with  random  coefficients.  We  assume  that  the 
randomness  can  be  defined  by  a finite  Markov  chain.  Under 
certain  assumptions  we  are  able  to  calculate  an  upper  bound 
to  an  optimal  value  of  the  decision  problem  and  to  use  that 
bound  to  determine  a useful  initial  decision. 


A NEW  APPROACH  TO  MULTI-STAGE  STOCHASTIC  LINEAR  PROGRAMS 


I ■ 


by 

Richard  C.  Grinold 


1.  INTRODUCTION 

This  paper  presents  a novel  and  hopefully  useful  way  of  looking  at 
multi-stage  stochastic  linear  decision  problems.  We  assume  that  the  parameters 
that  govern  the  evolution  of  the  system  are  random  variables  with  finite  range 
and  that  the  values  of  these  parameters  are  determined  by  the  state  of  a finite 
Markov  chain.  This  assumption  is  a limitation  on  the  general  case  of  stochastic 
linear  programs,  however  the  loss  in  generality  is  offset  by  an  ability  to 
perform  useful  computations. 

Our  discrete  time  system  can  be  viewed  as  a two  stage  decision  process;  the 
initial  decision  followed  by  all  future  decisions.  The  initial  decision  is 
subject  to  known  constraints,  leads  to  a known  expected  reward,  and  produces 
a random  input  into  the  second  stage  of  the  decision  process.  Our  procedure, 
in  effect,  calculates  an  upper  bound  for  the  expected  present  value  of  the 
random  input  into  the  second  stage  of  the  decision  process.  If  we  make  the 
assumption  that  this  upper  bound  is  a reasonable  approximation  for  the  value  of 
the  input,  then  we  are  able  to  calculate  the  initial  decision  that  maximizes  the 
expected  first  period  reward  plus  the  expected  present  value  of  all  future 
decisions.  Notice  that  the  number  of  future  decision  stages  is  not  important. 

In  fact  there  can  be  an  infinite  number  of  future  decisions. 

Section  2 describes  the  decision  process  in  detail  while  Section  3 defines 
the  set  of  feasible  policies.  In  Section  4 we  see  that  each  feasible  policy 
leads  to  a sequence  of  conditional  expectations  of  future  decisions  and  a solution 
to  an  infinite  horizon  linear  programming  problem.  In  Section  5 we  developed  an 
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upper  bound  for  Che  infinite  horizon  linear  program  and  in  Section  6 we  use  this 
knowledge  to  construct  an  initial  feasible  decision  for  the  stochastic  program. 
Section  7 is  a summary  and  an  outline  of  some  interesting  open  questions. 

The  paper  is  based  directly  on  two  streams  of  thought.  First,  the  study 

V 

and  solution  of  infinite  horizon  linear  programs  by  Manne  [6],  Hopkins  [5],  \ 

Grinold  and  Hopkins  [A],  and  Evers  [1].  In  particular,  Assumption  III  in 

V 

Section  5 and  its  consequences  are  based  on  ideas  proposed  by  Evers  [1].  The 
second  idea  is  from  earlier  papers  [2], [3]  on  dynamic  stochastic  decision 
processes  in  which  the  Markovian  assumption  was  first  proposed  and  exploited. 

An  indirect  and  undoubtedly  more  important  source  has  been  the  pleasure  of 
learning  from  Roger  Wets  [7]. 


2.  THE  MODEL 


This  section  presents  a description  of  the  stochastic  decision  process, 

introduces  notation  and  defines  terminology.  The  first  part  of  the  section 

describes  a relatively  simple  model  while  the  second  portion  of  the  section 

shows  how  more  general  models  can  be  reduced  to  the  same  simple  form. 

We  observe  a system  at  discrete  points  in  time  t • 0,1,2 

At  time  t the  system  can  be  described  by  an  m + 1 dimension  vector 

(st,it)  where  s^.  e Rm  and  ie{l,2,...,k}.  We  refer  to  st  as  the 

Vector-state  and  as  the  index-state.  The  notation  { s t } or  { i } refers 

to  the  sequence  s or  i for  t > 0 . 

t t = 

Given  state  (s^.i^)  at  time  t the  possible  decisions  ut  are  constrained 
by 

(1)  A^t^Ut  “ St,Ut  = ° 

where  A(i)  is  an  * n matrix.  Selection  of  a decision  ufc  results  in  a 
reward  with  expected  present  (time  zero)  value 

(2)  etc(it)ut 

where  B*”  > 0 is  a discount  factor. 

The  new  index  state  at  time  t + 1 is  determined  by  the  transition 
probabilities  of  a finite  Markov  chain.  Thus 

(3)  Prob[it+1  - J | it  - i]  - P4j  • 

Given  the  transition  from  ifc  to  i ^ , the  new  vector  state  at  time 
t + 1 is  described  by  the  linear  relation 
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To  avoid  a conceptual  and  theoretical  difficulty  we  shall  assume  that 
the  system  cannot  reach  a situation  where  (1)  has  no  feasible  solutions. 


Assumption  I: 


If  p 


ij 


0 , and  v > 0 , then  there  exists  a 


satisfying 


(5) 


A(j)u  = K(i,j)v,  u > 0 . 


We  now  consider  several  possible  generalizations  of  the  model  described 
above  and  indicate  how  they  can  be  reduced  to  the  simple  case. 

First,  suppose  if  = i,  ifc+1  = j and  s ^ * K(i,j)ut  + d(i,j)  . Then 
we  could  first  expand  the  vector-state  by  one  dimension  and  write 


As  a second  variation,  suppose 


«*t+i  = D(i,j)st  + K(i,j)ut  . 


Define  K(i,j)  ■ D(i,j)  A(i)  + K(i,j)  ; since  A(i)ufc  ■ we  have 


st+l  = +K(i,j)]ut  = K(i,j)ut  . 


For  a third  extension,  suppose  A(j)  is  m^  * n 


and 


A(j)Vl  - H(j)st+l 


where  H(j)  has  full  row  rank.  Then  (by  suitably  interchanging  columns)  we 
can  write  H(j)  = B(j)[I,N(j)]  where  B(j)  has  an  inverse.  Let  s ■ st+^  , 
and  u£+i  * u1  and  partition  s ■ K , thus  H(j)s  * B(j)s^  + B(j)N(j)s^ 

_1  i i o Ls  J 

so  B(j ) A(j)u  = s + N(j)s  . Now  expand  the  control  space  and  define 
2 3 2 

u - u * s . We  have 
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*B( j) _1A( j)  -N(J)  N(j) 

0 I -I 

12  3 n 
u ,u  ,u  > 0 . 

A fourth  and  trivial  extension  is  to  allow  the  dimension  of  to  vary 

ni 

with  if  , i.e.,  e R . However,  we  can  compensate  for  this  by  setting 
n = max  n^  and  adding  zero  columns  to  the  A(i)  and  K(i,j)  matrices. 

Finally,  we  can  use  all  the  Markov  chain  state  space  expansion  tricks  that 
transform  apparently  time  dependent  stochastic  processes  into  Markov  processes. 
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3.  POLICIES 

In  this  section  we  define  the  set  of  feasible  policies  and  the  value  functions 
associated  with  them. 

A realization  of  the  system  is  defined  to  be  the  initial  state  (s  ,i  ) and 

oo 

the  sequence  ifc  for  t > 1 . A policy  is  a function  from  the  set  of  realizations 
to  the  set  of  decision  sequences  {ufc}  . To  be  feasible  a policy  must  have 
three  properties. 


(i)  It  is  nonanticipatlve.  Given  two  realizations  with  sq  , and  ifc 
for  0 < t < T identical,  then  the  selection  of  uT  will  be 
(1)  identical.  In  other  words  the  value  of  u^  is  independent  of  the 

values  of  ifc  for  t > T . 

(11)  The  policy  satisfies  (2;  1). 

(iii)  The  policy  satisfies  (2;  4). 


Note  that  the  initial  conditions  (io,so)  and  the  specification  of  a policy 

are  sufficient  to  recursively  determine  {ufc}  and  {s^^}  for  “y  realization  { i } . 

We  shall  let  ip  denote  a policy  and  ¥ the  set  of  feasible  policies. 

Given  s ■ s and  i ■ i , we  can  define  the  value  of  a policy  ip  . 
o o 


(2) 


(i) 


(il) 


VT(^,s,i)  ■ e|  l 3tc(it)ut| 


V°°(<|>,s,i)  ■ lim  inf  VT(4),s,i)  . 
X -*>  ® 


We  can  also  define  the  optimal  value  functions. 


(i) 


(3) 


VT(s,i)  - sup  VT('j'.s,i) 
iJigY 


V (s,i)  - sup  V (^,8,1)  . 

ipeV 


(ii) 
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4.  CONDITIONAL  EXPECTATIONS 

Given  a policy  ip  and  initial  conditions  sq  , and  iQ  we  can  define 
the  expected  decisions  for  ail  t conditional  on  the  value  of  i^  . This 
section  will  show  how  these  conditional  expectations  correspond  to  feasible 
solutions  of  an  infinite  horizon  linear  program. 

Let 


tt  t i = Prob[it  = i] 


Uit  = E[ut  I *t  = 11 


(1) 


Sit  = Et9t  1 Lt  " 11 


Note  that 


W = 71  U 

wit  ti  it 


xit  = *ti  8it 


(2) 


tv  IV 

E(ucJ  - ^ ,el  ult  - ^ wlt 


From  Bayes'  rule  we  can  calculate 


(3) 


Probfi  » i | i . - j]  - ~~ — H- 

*t+l,J 


where  we  interpret  0/0  to  be  0 . Thus 


•j.tu- i <K(1'J>Ult)  ■A(3)uJt+1 


(4) 


or  if  we  multiply  (4)  by  n . we  obtain 

tTl»J 


(5) 


J,t+1 


-Z 

i=l 


PijK(i,j)wit  * A(j)w 


J. 


Define 


and 


•puK(l,l)  P21K(2,1)  • • • PklK(k 

P12K(1,2)  p22K(2,2)  . . . pk2K(k 

• • 

• • 

PlkK(l,k)  P2kK(2,k)  . . . pkkK(k 


"V 

’wlt 

X = 

X2t 

• 

• 

w = 

W2t 

• 

• 

>. 

rr 

a 

i 

c 

7T  • 
rr 

1 - 

c * (c(l) ,c(2)  ....  c(k)) 


A(l)  0 . . . 0 

0 A(2)  . . . 0 


0 0 . . . A(k) 


With  these  definitions  (5)  becomes: 


wt+l 


Kw. 


Aw 


t+1 


t+1  ' 

,D‘ 

,2) 

,k) 


(7) 
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In  addition,  note  that 


(8) 


T r t 

V (if/,s,i)  ■ l 8 cw  , 3ince 

0 


Vl(tM,i)  - E 


[l  st'<v\] 


T k 

l e \i  I c(i)uit 
o i-i  1 


8 cwt  . 


We  have  demonstrated  how  each  policy  corresponds  to  a feasible  solution  of 
the  horizon  linear  program; 


(9) 


r t 

maximize  lim  inf  ) 6 csr  . 

T - » o 1 


Subject  to 


Aw^  ■ Kw^  , t > 1 
t t-1  ■ 

w.  > 0 . t > 0 • 

t - ■ 

Although  each  policy  defines  a feasible  solution  to  this  infinite  horizon 
linear  program,  the  converse  of  this  is  not  true.  There  exist  feasible  solutions 
of  (9)  that  cannot  be  generated  by  a feasible  policy  if/  c 4*  . Consider  this 
example: 
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with 


A(l)  * A(2)  ■ A(3)  ■ K(l,2)  ■ K(1 


•3>  ■ (i  i 1) 


and  K(4,l) 


/-l  1 l\ 

A(4)  - 

\ 1 2 o) 

/o  o o\ 
\1  1 1/  * 

K(3,4)  > 

- (o  o o\ 

and  s 

[o  0 0) 

0 

(!)• 


i - 1 . 

O 


For  this  system  the  feasible  solutions  of  the  infinite  linear  program  are 

t = 0 wq  - (w1Q, 0,0,0)  where  A(l)w1()  - » w10  - ° * 

t = 1 wx  - (0,w21,w31,0)  where 


n - >iK(l,2)»10  - >iA(l)w10  - (j)  , »21  > 0 

‘ (l)  ' W31  ■ 


A(3)wa1  = 4K(l,3)w  - *iA(l)w 


t * 2 


w2  - (0,0,0,w42)  where 


A(4)w^2  - K(2,4)w21  + K(3,4)w31  » w42  * ° 


t > 3 


(0,0, 0,0)  . 
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Notice  there  is  a feasible  solution  of  the  Infinite  horizon  linear  program  with 

w42  = (0,1, l)1  . 

Now  let's  examine  the  stochastic  decision  problem.  At  time  2,  we  shall 
surely  be  in  index  state  4,  i.e.,  tt^  ■ 1 , and  due  to  the  special  structure 

of  the  example  the  state  vector  can  take  one  or  two  possible  values,  depending 

on  the  value  of  1^  . 

with  probability  h 
^jwith  probability  h 

A feasible  policy  must  indicate  what  to  do  in  either  situation.  Let  u2(l) 
and  u2(2)  be  the  decision  in  either  of  the  two  cases.  The  expected  decision 

is  u^2  ■ ®su2(l)  + !iu2(2)  . However,  notice  the  first  component  of  the  three 

dimensional  vector  u2(l)  must  be  positive.  Thus  if  a feasible  policy  generates 

w42  * 7r24u42  ’ the  flrst  comPonent  °f  w42  must  be  Positive.  We  know,  however, 
that  a feasible  solution  of  the  infinite  horizon  linear  program  exists  with  the 
first  component  of  w^2  equal  to  zero.  Thus  all  solutions  of  the  infinite 
horizon  linear  program  are  not  necessarily  generated  by  feasible  policies  in  the 
stochastic  decision  problem. 
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5.  UPPER  BOUNDS 

In  Section  4 we  demonstrated  that  an  infinite  horizon  linear  program  could 
be  derived  that  must  have  an  optimal  value  greater  or  equal  to  V(s,i)  , the 
optimal  value  of  the  multi-state  stochastic  decision  problem.  Let  U(s,i)  be 
the  optimal  value  of  the  infinite  horizon  linear  program  as  a function  of  the 
initial  condition.  We  know  that  V(s,i)  < U(s,i)  . In  this  section  we  develop 
a linear  program  with  optimal  value  W(s,i)  and  show  that  under  appropriate 
conditions  V(s,i)  < U(s,i)  < W(s,i)  . 

Let  n be  the  set  of  all  feasible  solutions  to  the  infinite  horizon  linear 
program,  a sequence  {wtl  e II  if  and  only  if 


(1) 


Aw  * x , w >0 
o o o « 


Awt  " Kwt-1  » wt  - ° ’ 1 - 1 ' 


w 

Let  if  C n be  the  set  of  solutions  such  that  [ 8*w  is  finite.  For 


{w  } e n , define  w * V B w,  and  note  that 
t . t 

t=l 


t-0 


(2) 


cw  < U(s,i) 

(A  - SK)w  - x 

o 

w > 0 . 

m 


Thus  we  can  discover  an  upper  bound  on  the  value  of  the  best  solution  in 
fl  by  solving  the  finite  linear  program 


subject  to 


(3) 


Max  cw 


(A  - 8K)w  - x 

o 

w > 0 . 

m 


v.  . 
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Let  W(s,i)  be  the  optimal  value  of  (3).  To  show  that  W(s,i)  is  an  upper 
bound  for  U(s,i)  we  must  demonstrate  that  an  optimal  solution  for  the 
infinite  horizon  linear  program  can  be  found  in  the  class  Ii  . We  must  make 
two  assumptions. 


Assumption  II: 

An  optimal  solution  to  problem  (3)  exists. 


Assumption  III: 

There  exist  (y»z)  which  satisfy 


(4) 


y(A  - 0K)  - z - c 


z > 0 , and  either  yA  - c > 0 , or  yA  > 0 . 

A consequence  of  Assumption  III  is  that  solutions  not  in  n are  infinitely  bad. 
Let  {wt}  e II  , from  (1)  and  (4)  we  can  obtain  for  all  T 


T T 

y*  * 2 l sV  + l Btcw  + eT+1  yAw  , 

0 0 

(5) 

T t T+l 

- z l 3 w + l 0 cw  + B (yA  - c)w  . . 
0 0 c 


T 

If  Assumption  III  is  satisfied,  and  {w  } e n\lf  , then  z l 3 

t 0 
However,  nonnegativity  of  yA  or  yA  - c allows  us  to  write 


-V  + OO  . 


T t T 

yXo  - ^ 0 Wt  + l St<:wt 


or 


T t T+l 

yx  > z l 3 w + [ B cw  . 
0 0 


(6) 
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The  term  on  the  left  Is  constant,  the  first  term  on  the  right  diverges  to 
+ ® , thus  we  must  have  the  second  term  on  the  left  diverging  to  - . 

This  indicates  the  value  of  any  solution  {wt)  e IlVlf  is  - 00  . Since, 
by  Assumption  II,  the  linear  program  has  an  optimal  solution,  we  can  conclude 

(7)  V(s,i)  < U(s,i)  < W(s,i)  . 
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6.  OBTAINING  DECISIONS  FOR  THE  STOCHASTIC  MULTI-STAGE  LINEAR  PROGRAM 

This  section  Indicates  how  the  theory  developed  In  Sections  2-5  can  be 
used  to  generate  a decision  for  the  stochastic  optimization  problem.  We 
know  that  W(s,i)  , the  optimal  value  of 


Max  cw 

(1)  (A  - SK)w  - x 

o 

w > 0 , 

Is  an  upper  bound  for  V(s,i)  and  we  hope  that  V(s,i)  is  close  to  W(s,i)  . 
Even  if  this  is  so,  an  approximation  of  V(s,i)  is  not  extremely  useful;  we 
must  determine  an  initial  decision  that  is  consistent  and  with  that  value. 

We  show  in  this  section  how  such  an  initial  decision  can  be  obtained. 

Without  loss  of  generality  we  can  assume  that  iQ  * 1 , and  that 
p^  - 0 for  i * 1,2,  ....  k . Thus  we  cannot  return  to  the  initial  index- 
state.  In  this  case  problem  (1)  becomes  (for  k ■ 3)  : 

Max. 

12  3 

CjW  + + cw 


subject  to 


A(l)  0 

-8p12K(l,2)  A(2)-8p22K<2,2) 

-8p13K(l,3)  -8p23K<2,3) 


0 

1 

K 

•— 

1 

s 

0 

-8p32K(3,2) 

2 

w 

a 

0 

A(3)-6p33K(3,3)> 

1 

c 

u> 

l 

_ 0 _ 

3 

> 0 , w > 0 

m m 


> 


■ V 
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00 

i r t 

where  w ■ l 6 w.  . 

t-0  1C 

However,  from  the  structure  of  K it  is  obvious  that  wit  ■ 0 for 
1 , thus  w*  ■ w^q  , a feasible  initial  decision  that  attains  the  upper 


bound 


W(s,i) 
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7.  SUMMARY  AND  OPEN  QUESTIONS 

This  paper  has  presented  a simple  operational  method  of  finding  a good 
initial  decision  for  a multi-stage  stochastic  programming  problem  and  for 
calculating  an  upper  bound  on  the  optimal  value  of  the  stochastic  program. 

The  key  modeling  assumption  is  that  the  stochastic  evolution  of  the  system 
can  be  described  by  a Markov  chain.  The  hope  is  that  the  upper  bound  is 
nearly  exact  and  therefore  the  initial  decision  is  nearly  optimal.  In  a 
related  paper  [3],  a special  case  was  described  in  which  the  bound  is  exact, 
and  the  policy  is  optimal. 

Several  interesting  related  questions  remain  open.  When  does  an  optimal 
policy  exist  for  the  stochastic  optimization  problem?  Is  there  an  optimal 
stationary-Markov  (u  depends  only  on  i and  s)  policy?  When  does  the 
optimal  solution  of  the  infinite  horizon  linear  program  correspond  to  a 
realizable  policy  in  the  stochastic  program? 
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